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ABSTRACT 


A production  system  technique  for  recognising  regularities  in 
serial  patterns  is  presented  in  the  context  of  the  letter  series  extrapo- 
lation problem.  The  learning  technique  consists  oi  creating  an  ordered 
set  of  production  rules  to  represent  the  concept  of  a pattern,  such  that 
each  rule  is  a hypothesis  about  which  pattern  contexts  lead  to  which  new 
pattern  elements.  The  production  system  learning  technique  is  compared 
with  other  series  extrapolation  methods  and  examples  of  series  concepts 
learned  by  a computer  implementation  of  the  technique  are  given. 


SERIAL  PATTERN  ACQUISITION:  A PRODUCTION  SYSTEM  APPROACH 

by  D.  A.  Waterman 


A major  hurdle  to  be  faced  in  implementing  computer  models  of  complex 
learning  is  the  basic  task  of  recognizing  regularities  in  data.  This  is 
particularly  critical  for  so  called  "induction"  type  learning  where  a 
large  number  of  specific  data- representations  must  be  mapped  into  a single 
more  general  data-representation . Much  work  has  already  been  done  on  in- 
duction programs,  particularly'  in  the  area  of  pattern  recognition  (Self- 
ridge and  Neisser,  1963;  Zobrist,  1971;  Uhr,  1973)  and  sequence  extrapola- 
tion (Feldman,  1963;  Simon  and  kotovsky,  1963;  Uhi,  1964;  Solomonoff,  1964; 
Ernst  and  Newell,  1969;  klahr  and  Wallace,  1970;  Williams,  1972;  Hedrick,  1974; 
Hunt  and  Poltrock,  1974).  A somewhat  different  approach  to  the  problem  of 
machine  induction  will  now  be  presented. 

Ideally,  what  is  needed  is  a simple  uniform  technique  for  recognizing 
regularities  in  data,  a technique  which  can  be  considered  a natural  extension 
of  basic  associative  learning  techniques  such  as  rote  learning.  Such  a tech- 
nique would  tend  to  bridge  the  gap  between  simple  learning  like  memorizing  the 
addition  table,  and  complex  learning  like  inducing  the  concept  of  a series. 

In  this  paper  a technique  for  recognizing  regularities  will  be  presented 
in  the  context  of  the  series  extrapolation  problem.  No  attempt  will  be  made 
here  to  generalize  this  technique  to  other  induction  type  problems,  although 
some  sort  of  generalization  seems  feasible.  First  the  problem  of  data  repre- 
sentation will  be  discussed.  Then,  the  learning  technique  will  be  described 
as  it  applies  to  letter  series  extrapolation  problems.  Finally,  examples  will 
be  presented  of  series  concepts  learned  by  a computer  implementation  of  the 
learning  technique. 
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11.  DATA  REP RH S ENTAT I ON 

Basic  associative  learning  can  be  thought  oi  as  associating  a stimulus 
A with  a response  B.  This  can  be  represented  very  naturally  as  a set  of 
production  rules  (Newell  and  Simon,  1972),  since  a production  rule  is 
just  a set  of  conditions  associated  with  a particular  set  of  actions.  Thus 
a portion  of  the  addition  table  for  integers  could  be  represented  as  the 
following  ordered  set  ol  rules: 

1,1  - 2 

1.2  ->3 

1.3  + 4 
1,4+5 

This  is  interpreted:  if  you  have  1+1  then  the  sum  is  2,  else  if  you  have 

1+2  it's  3,  etc.  Only  ordered  production  systems  will  be  considered, 
that  is,  to  obtain  a result  the  conditions  in  the  left-hand  sides  of  the 
rules  are  compared  to  elements  in  some  data  base,  and  the  highest  priority 
rule  (topmost  rule)  whose  conditions  all  match  data  base  elements  has  its 
actions  executed. 

More  complex  information,  such  as  letter  series  concepts  can  also  be 
expressed  in  production  system  notation.  For  example,  the  concept  of  the 
series  CUCDCD  can  be  represented  as: 

1.1  C ->  D 

(1) 

1.2  I)  -+C 

This  can  be  interpreted:  if  the  last  letter  in  the  series  is  C then  the 

next  is  D,  else  if  the  last  is  D then  the  next  is  C.  It  is  clear  that  this 
is  all  that  is  needed  to  extend  the  series  indefinitely. 

Simple  Letter  Series  Concepts 

The  concept  of  a series  will  be  defined  to  be  a set  of  extrapolation 
nales,  as  in  (1)  above,  together  with  a set  of  initialization  rules.  The 


r 


i 

3. 

extrapolation  rules  contain  enough  information  to  extend  the  series,  but 
both  extrapolation  and  initialization  rules  arc  needed  to  generate  the 
series  from  scratch.  Initialization  in  (1)  can  be  provided  by  including 
* C as  the  last  rule  of  the  production  system,  where  the  asterisk  (*) 
represents  a condition  defined  to  match  any  data  base,  even  an  empty  one. 

Thus  if  no  extrapolation  rules  match  the  data  base  then  the  initialization 
rule  * - C will  match  by  definition.  In  this  paper  the  extrapolation  rules 
will  be  referred  to  as  the  concept  of  the  series,  with  the  understanding 
that  the  actual  concept  also  includes  initialization  rules. 

Consider  the  more  interesting  scries,  GBDGBGBDGBG.  This  series  is 
composed  of  repeated  occurrences  of  the  string  GBDGB.  Furthermore,  its 

description  does  not  require  the  use  of  predecessor  or  successor  relations 

on  an  alphabet.  Series  like  this  which  can  be  described  using  nothing  more 
than  the  equality  or  same  relation  will  be  called  simple  repetition  type 
series.  A production  system  (PS)  representation  of  the  concept  of  this 
series  is  shown  below. 

2.1  D G B + G 

2.2  G B ■>  D 

2.3  D+G  (2) 

2.4  G + B 

This  is  interpreted:  if  the  last  3 letters  in  the  series  are  DGB  the  next 

letter  is  G,  otherwise  if  the  last  2 are  GB  the  next  is  D,  etc.  The  rules 

are  always  applied  to  the  growing  end  of  the  series  and  always  result  in  the 

prediction  of  a single  letter.  To  indicate  that  the  series  starts  with 

G,  the  initialization  rule  *->G  is  needed  at  the  bottom  of  the  production 
system. 


In  production  system  (1)  the  regularities 


represented  are  the  facts  that 
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D always  follows  C,  and  C always  follows  D.  In  production  system  (2)  they 
are  that  G always  follows  the  string  DGB,  I)  follows  all  GB's  not  immediately 
proceeded  by  D,  G always  follows  D,  and  B always  follows  G.  This  shows, 
at  least,  that  a production  system  representation  is  adequate  for  expressing 
the  concept  of  a simple  repetition  type  series  in  terms  of  its  regularities. 

Sequence  Prediction  Tasks 

In  the  literature  on  induction  and  learning  the  work  most  closely 
related  to  production  system  representation  of  "eguiarities  is  the  analysis 
made  by  Restle  (1967)  of  subjects  performing  sequence  prediction  tasks. 

The  subjects  were  given  a series  of  binary  events  equivalent  to  a sequence 
of  l's  and  0's,  and  were  asked  to  predict  each  event  in  the  sequence,  given 
the  partial  sequence  prior  to  that  event.  Pretraining  and  test  sequences 
were  analyzed  in  terms  of  generative  rules*,  i.e.,  grammar-like  replacement 
rules  tiat  could  be  used  to  generate  the  sequences.  The  test  sequence  used 
was  111C010001 11001 ... , which  has  a period  size  of  nine.  Figure  1 compares 
Restle's  replacement  rules  for  the  test  sequence  with  a production  system 
representation  of  that  sequence.  The  replacement  rules  in  no  sense  consti- 
tute a production  system  or  even  a Markov  normal  algorithm  (Markov,  1954; 
Galler  and  Perlis,  1970)  for  generating  the  series.  Instead  they  define  a 
grammar  which  can  generate  a number  of  series,  including  the  test  sequence. 
For  example,  the  top  replacement  rule  generates  the  seventh  item  of  the 
sequence  and  is  interpreted  "if  you  have  1 then  replace  it  with  0".  Thus 
the  test  sequence  can  be  generated  by  starting  with  000  and  applying  the 
rules  as  shown  below: 

ooo  =>  =>  n =>  in  =>  o oo  =>  i =>  o =>  oo  =s  ooo. 

*These  rules  were  inferred  by  a manual  analysis  of  the  test  sequence,  rather 
than  by  a computer  model  of  the  induction  task. 
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Item  Predicted 
(7) 

(6) 

(3) 

(2) 

(9) 

(1) 

(3,5) 

(4) 


Replacement  Rules 
1 - 0 
0 0-1 
11-111 
1-11 
0 0 - 0 0 0 
0 0 0 -1 
0-00 
111-0 


Production  Rules 
10  0 1-0 
110  0-1 
0 11-1 
0 0 1-1 
10  0-0 
0 0-1 
0-0 
11-0 


Figu:  •;  1.  Comparison  of  Rcstlc's  replacement 

rules  and  production  system  rules  for 
the  scries  with  period  111001000. 
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The  reason  the  replacement  rules  generate  series  other  than  the  test 
seauence  is  that  some  rules  (7  and  2,  6 and  9)  contain  identical  left 
hand  sides.  Restle  found  that  subjects  make  the  most  errors  predicting 
items  that  these  "optional"  rules  generate. 

The  production  rules  in  figure  1,  unlike  the  replacement  rules,  repre- 
sent the  concept  of  the  test  sequence  since  they  have  associated  with  them 
a general  control  mechanism  (interpreter  for  ordered  PS's)  which  defines 
their  use.  Notice,  however,  that  the  replacement  and  production  rules  are 
pair-wise  isomorphic,  i.e.,  for  each  replacement  r„le  that  predicts  a 
symbol  there  is  a corresponding  production  rule  that  predicts  the  same  symbol. 
The  production  rules  which  correspond  to  the  "optional”  replacement  rules 
are  the  most  complex,  since  they  have  the  most  symbols  in  their  left  hand 
sides.  This  occurs  because  enough  context  must  be  retained  in  the  left  hand 
side  of  the  production  rule  to  discriminate  between  similar  alternatives. 

Thus  within  the  PS  framework  one  would  expect  the  most  errors  during  learning 
to  occur  on  items  generated  by  the  most  complex  rules,  which  corresponds  to 
the  result  obtained  by  Restle.  We  will  now  consider  the  problem  of  generating 
a concept  from  a series  and  will  describe  a learning  technique  capable  of 
creating  the  production  rule  representation  shown  in  Figure  1. 

III.  BASIC  LEARNING  TECHNIQUE 

A learning  technique  will  now  be  described  that  is  a simple,  uniform 
procedure  for  generating  the  concent  of  a series  by  finding  icgularities  in 
the  series.  In  general  terms,  the  technique  consists  of  creating  a hypothesis 
about  a particular  type  of  regularity  in  the  data,  adding  this  hypothesis, 
in  the  form  of  a production  rule,  to  the  current  set  of  hypotheses  (the 
production  system),  and  then  using  the  data  to  test  the  hypotheses.  Ivhen 
the  data  prove  a hypothesis  false,  a new  hypothesis  is  added  above  the 
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error- causing  one. 

In  terms  of  series  concepts,  each  hypothesis  consists  of  a production 
rule  formed  from  a consecutive  sequence  of  letters  from  the  series  (the 
condition)  and  the  letter  assumed  to  follow  that  sequence  (the  action). 

1he  action  alwa>'s  consists  of  just  a single  letter.*  Sequences  of  the 
series  are  presented  to  the  production  system  (first  letter,  first-two 
letters,  first- three  letters,  etc.)  and  it  predicts  what  the  next  letter 
should  be.  The  prediction  is  checked  by  comparing  it  to  the  next  letter  in 
the  actual  series.  When  the  prediction  is  in  error  a new  rule  is  added 
to  the  system  above  the  error-causing  rule.  The  new  rule  contains  one  more 
letter  in  its  condition  than  the  error-causing  rule  and  the  actual  next 
letter  as  its  action.  The  principle  is  one  of  minimum  local  consistency. 

A new  rule  is  always  a correct  statement  about  the  sequence,  and  is  only 
created  following  an  error  at  precisely  that  point  in  the  sequence.  When  no 
prediction  is  made  (the  sequence  of  letters  fails  to  match  any  of  the  rules) 
a new  rule  with  a condition  equal  to  the  rightmost  letter  of  the  sequence 
and  an  action  equal  to  the  actual  next  letter  is  added. 

A learning  cycle  for  a series  containing  n+1  letters  consists  of  pre- 
senting the  system  with  the  first  letter,  the  first-two  letters,  on  up 
through  the  first-n  letters,  and  obtaining  a prediction  in  each  case.  The 
learning  £hase  consists  of  repeated  learning  cycles  and  is  complete  when  a 
learning  cycle  is  encountered  which  produces  nothing  but  correct  predictions. 
At  this  point  the  production  system  represents  the  concept  of  the  series  and 
can  be  used  to  predict  the  cx*-nsions  of  the  series. 


*Rules  which  predict  more  than  one  letter  can  also  be  used  to  form  products 
system  concepts  of  series.  Such  systems  can  be  generated  using  the^ame 

is  u!«8thev ^qUeSfdcSC yb.cd  in  this  PaPcr-  problem  with ’such  systems 
Dredirr  Jhhv  ?w  ^ mduly  complicated  rules  when  the  number  of  letters 
predicted  by  the  rules  exceeds  the  period  size  of  the  series  being  represent 
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An  example  of  this  technique  applied  to  the  series  GBDGBGB  will  now 
be  presented.  Initially  the  system  contains  no  rules  and  thus  fails  to 
predict  the  first  letter  of  the  series.  This  error  leads  to  the  creation 
of  the  default  rule  * - G.  Now  G is  given  to  the  system  and  matches  the 
default  rule.  This  is  considered  an  error*  so  the  rule  G - B is  added. 

Now  GB  is  presented,  does  not  match  G -*  B,  but  does  match  the  default  rule 
Mnce  this  is  considered  an  error  B - D is  added.  Next  GBD  is  presented 
and  also  matches  only  the  default  rule,  leading  to  the  addition  or  the 
rule  D -*■  G.  The  system  now  looks  like: 

3.1  D •*  G 

3.2  B - D 

(3) 

a.  a G -*•  B 

3.4  * -»■  G 

Next  GBDG  is  presented,  which  matches  3.3,  predicting  that  the  next  letter 
is  B.  From  the  series  we  see  this  is  indeed  the  next  letter  so  no  new 
rule  is  necessary.  Next,  GBDGB  is  presented  which  matches  3.2,  predicting 
that  the  next  letter  is  D.  From  the  series  we  see  the  next  letter  is 
actually  G,  so  the  rule  G B ->  G is  added.  Next  GBDGBG  is  presented  and 
matches  3.3,  correctly  predicting  B.  Now  the  first  cycle  is  complete,  but 
since  errors  occurred  the  process  starts  over,  and  G is  presented  to  the 
system,  which  is  now: 

4.1  G B + G 

4.2  D - G 

4.3  B -v  D (4^ 

4.4  G + B 

4.5  * -v  G 


Default  for  initialization)  rules  are 
predictions  in  order  to  accelerate  the 


always  considered  to  make  erroneous 
learning  process. 
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G matches  4.4  and  the  correct  prediction  is  made.  But  now  GB  is  presented 
and  leads  to  an  incorrect  prediction,  thus  G B D is  added.  GBD  and  GBDG 
both  elicit  correct  predictions  but  GBDGB  matches  G B + D which  predicts 
L instead  of  G.  Thus  D G B + G is  added.  After  one  more  correct  predic- 
tion the  third  cycle  begins,  but  this  time  all  predictions  are  correct 
and  thus  the  learning  phase  terminates.  Figure  2 diagrams  the  rule  acquisi 
tion  process  for  this  particular  series,  showing  the  first  two  cycles. 

The  rules  learned  are: 


5.1 

D G 

B 

- G 

5.2 

G 

B 

-*•  D 

5.3 

G 

B 

-v  G 

5.4 

D 

-*•  G 

5.5 

B 

-*•  D 

5.6 

G 

-*•  B 

5.7 

* 

-*•  G 

We  will  consider  the  concept  of  the  series  to  be  the  set  of  non-redundant* 
rule,  learned,  i.e.,  the  rules  that  can  be  accessed  using  this  series  as 
context.  We  see  that  G B - D (rule  5.2)  makes  GB+G  (rule  5.3)  uncondi- 
tionally redundant,  and  B - D (rule  5.5)  contextually  redundant  (since,  in 
this  particular  series,  G always  occurs  before  B) . The  default  rule  is 
always  contextually  redundant.  Removing  these  redundant  rules  from  produc- 
tion system  (5)  gives  the  concept  of  the  series  as  shown  in  production  system  (2).** 
This  learning  technique  will  handle  all  letter  series  based  on  simple 
repetition  and  this  is  a theory  for  recognizing  regularities  in  such  series. 


*See  Waterman  (1970)  for  a discus 
production  systems . 


sion  of  redundancy  as  applied  to  ordered 


3<rih<'^SyStem  does  not  have  t0  remove  these  rules  since  their  presence  cannot 
ttect  system  output.  In  the  current  implementation  the  rules  are  left  in* 
however,  they  could  be  removed  by  having  the  system  keep  track  of  non-firing 
rules,  eventually  eliminating  them.  g 
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V no  match/5.6  created 
G £ 

\ no  match/5.5  created 
G B ID 

\ no  match/5.4  created 
G B D £ 

\ 5.6  matches/correct 

G B D G XB 

' 5.5  matches/5.5  created 

G B D G B G 

1 *\  5.6  matches/correct 

G B D G B G 'b 

I l 

a.  Cycle  1 


L— ' 5.6  matches/correct 

,G  B, 

' 'v  5.5  matches/5.2  created 
G B D 

' 5.4  matches/correct 

(3  B D C 

V 5.6  matches/correct 
C B D G B 

1 'v  5.2  matches/5.1  created 

G B D G B G 

1—'\  5.6  matches/correct 

C B D G B G B 

I I 

b.  Cycle  2 


Figure  2.  Diagram  of  Learning  Technique  on  CBDCBCB 
for  first  two  cycles.  (Underlined  letters 
at  tail  of  arrow  indicate  letters  used  as 
rule  left  hand  sides.  Letter  at  .lead  of 
arrow  is  rule  right  hand  side). 
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In  fact,  it  is  an  instantiation  of  the  compound  stimulus  hypothesis 
^Restle  and  Brown,  1970)  in  which  a response  is  assumed  to  be  associated 
with  some  sequence  of  adjacent  past  events.  Restle  and  Brown  found  a 
positive  but  weak  relationship  between  number  of  errors  at  a position  and 
number  of  previous  events  required  to  specify  the  next  event.  During 
production  system  learning  the  number  of  errors  made  at  each  position  in 
the  pattern  tends  to  be  proportional  to  the  number  of  elements  in  the  condi- 
tion side  of  the  rule  that  piedicts  an  element  for  that  position.  This  is 
true  because  each  error  during  learning  is  corrected  by  effectively  adding 
one  new  stimulus  element  to  the  condition  side  of  the  error-causing  rule. 

Now  we  will  see  how  an  extension  of  this  technique  can  be  applied  to  more 
complex  letter  series  extrapolation  problems. 


IV.  REPRESENTATION  OF  COMPLEX  LETTER  SERIES 

The  simplest  type  of  letter  series  other  than  those  characterized  by 
simple  repetition  are  those  requiring  the  use  of  predecessor  and  successor 
operations  on  the  alphabet  or  any  explicitly  defined  ordered  list  of  symbols. 
Examples  of  such  series  are  ABCDEF,  AAABBBCCC,  and  DEFGEFGH.  To  represent 
series  of  this  type,  the  system  must  be  able  to  handle  the  concept  of  varia- 
bles and  must  be  given  the  capability  for  executing  both  predecessor  and 
successor  operations  on  the  alphabet. 

Production  System  Representation 

In  the  production  system  representation  of  complex  letter  series 
variables  will  be  indicated  by  the  symbols  xl,  x2,  x3,  ...,  and  predecessor 
and  successor  operations  by  an  apostrophe  (')  before  or  after  a variable. 

Thus  'xl  represents  the  predecessor  of  xl,  and  xl'  its  successor.  A 
variable  in  the  condition  side  of  a rule  matches  anything  and  is  temporarily 
bound  to  the  value  of  what  it  matches,  thus  a bound  variable  can  be  used 
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in  the  action  side  of  a rule. 

With  these  refinements,  the  concept  of  the  series  ABCDEF  can  be  repre- 
sented as  xl  -*  xl'.  This  rule  is  interpreted:  if  the  last  letter  of  the 

senes  is  anv  jet  ter  then  the  next  letter  in  the  series  is  the  successor 
of  that  letter.  Initialization  would  be  accomplished  by  the  rule  * -*•  A. 
Conversely,  the  series  ZYXWYTJ  can  be  represented  as  xl  •>  'xl,  with  * -*•  Z 
for  initialization. 

Simple  repetition  can  now  be  represented  in  a very  compact  manner, 
i.e.,  consider  the  two  simple  repetition  series  discussed  earlier.  The 
first,  CDCDCD,  instead  of  requiring  the  two  rules  shown  in  production  system 
(1)  only  requires  one  rule  to  represent  its  concept: 

xl  x2  xl.  (6) 

The  second  series,  GBUGBGBDGBG,  instead  of  requirins  the  four  rules  shown 

in  production  system  (2)  also  requires  only  one  rule  to  represent  its 
concept: 

xl  x2  x3  x4  x5  -*•  xl.  (7) 

It  should  be  clear  that  any  simple  repetition  series  of  period  n can  be 
represented  by  a single  rule  of  the  form: 

xl  x2  x3  ...  xn  -*•  xl.  (g) 

Now  consider  the  more  complicated  series  AZCXEVGT.  Its  concept  can 
be  represented  as: 

xl  x2  xl'  ' -+  ' 'x2 

(9) 

xl  x2  h.  xl"  , 

where  double  apostrophes  stand  for  double  predecessor  or  successor.  If 
we  apply  these  rules  to  the  series  the  first  rule  fails  to  match  (since  T 

iS  n0t  the  d0uble  successor  of  V)  but  the  second  matches,  predicting  that 
the  next  letter  is  I.  If  the  rules  are  now  applied  to  AZCXEVGTI , the  first 
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rule  matches,  predicting  the  letter  R.  Thus  (9)  can  be  used  to  extrapolate 


the  series  as  shown  below. 


AZCXEVGTIRKPMNO  .. 


Comparison  with  other  Representations 

Other  programs  have  been  written  which  solve  letter  series  extrapolation 
problems  (Simon  and  Kotovsky,  1963;  Klahr  and  Wallace,  1970;  Williams,  1972; 
Hedrick,  1974;  Hunt  and  Poltrock,  1974).  The  Klahr  and  Wallace  (KfJW)  model 
represents  series  concepts  solely  on  the  basis  of  inter-period  relations,  i.e., 
relations  between  letters  occupying  the  same  relative  position  in  adjacent  per- 
iods. For  example,  letting  s stand  for  same,  n for  next,  p for  prior,  n2  for 

double  next,  and  p for  double  prior,  the  concept  of  series  (10)  would  be: 

2 2 

n p . The  number  of  relations  is  the  period  size  (in  this  case  2),  and 
the  representation  is  called  the  pattern  template.  Simple  repetition  is 
represented  as  a sequence  of  m same's,  where  m is  the  period  siz^.  Thus  the 
concept  of  GBDGBGBDGBG  is  just  sssss. 

The  Hedrick  mocel  represents  series  concepts  as  a set  of  unordered, 
grammar- like  productions  which  can  be  used  to  parse  a given  input  sequence 
to  determine  if  it  is  an  instance  of  the  series  in  question.  For  example, 
the  series  ABCDEF. . . would  have  a representation  equivalent  to  the  grammar:* 

si  •>  A B 

si  -*■  si  next(last  letter  of  si)  . 

Thus  when  given  the  sequence  ABCD  the  system  would  recognize  it  as  an 
instance  of  si  (the  series  ABCDEF...),  since  the  above  rules  lead  to  the 


This  is  a gross  simplification  of  the  actual  representation.  The  rules 
are  condition-action  pairs  where  the  conditions  are  pattern  matches  on 
both  the  series  and  an  intermediate  semantic  net  which  can  be  modified  by 
the  actions,  ihus  the  model  is  effectively  a production  system  implemen- 
tation of  a grammar. 
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parse  shown  below. 


si 


si  next'sl 


si  next  si 


A B CD 


The  Hedrick  model  learns  the  concept  of  a series  from  a set  of  examples 
(positive  instances)  by  creating  and  generalizing  productions  which 
classify  the  components  of  the  series.  The  model  would  have  to  be  given 
AB,  ABC,  and  ABCD  before  it  could  acquire  the  concept  of  the  above  series. 

The  Williams  model  is  part  of  a more  general  program  for  inducing 
performance  strategies  from  examples  taken  from  aptitude  tests.  Series 
concepts  are  represented  in  a way  very  similar  to  the  template  representa- 
tion of  Klahr  and  Wallace.  Rulec  ar*  constructed  which  define  the  inter- 
period relations  same  and  next,  one  rule  for  each  element  in  the  period. 
For  example,  series  (10)  would  be 


Rule 

Relation 

Iteration 

Start 

Move 

Alphabet 

1. 

next 

2 

1 

2 

Forward  English 

2. 

next 

2 

2 

2 

Backward  English. 

Rule  1 states  that  the  double  next  (next  with  iteration  2)  relation  on  the 
forward  alphabet  holds  between  letters  which  are  2 positions  apart,  starting 
at  position  1.  Rule  2 is  the  same  except  that  the  starting  point  is 
position  2 and  the  alphabet  is  the  backward  one.  This  representation  is 
essentially  a generalization  of  the  template  representation. 
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The  Hunt  and  Holt  rock  model  represents  series  concepts  on  the  basis  of 
both  inter-period  and  intra-period  relations.  This  model  uses  the  same  three 
basic  relations  used  by  the  other  models:  same,  next,  and  predecessor.  These 

relations  can  be  applied  either  to  adjacent  letters  within  a period  or  to  letters 
with  corresponding  positions  in  adjacent  periods.  The  series  concept  is  repre- 
sented as  a set  of  rules,  one  for  each  letter  in  the  period,  and  relates  each 
letter  to  some  other  letter  in  the  series.  A series  of  period  „ is  shown  below. 


S1  S2  s3  sn  Z1  21  z3  •••  z„ 

The  model  represents  the  series  as  n rules,  the  first  relating  2j  to  either 

Sn  °r  V the  SeCOnd  relatinS  z2  to  either  2j  or  s.,,  etc.  Thus  the  concept  of 
the  series  AAABBBCCC  would  be : 

z.  - next (s _) 

1 y 

z2  = sametzp 
z3  = same (z7)  . 

Simple  repetition  is  handled  by  a set  of  inter-period  rules.  To  illustrate, 
the  concept  of  GBDGBGBDGBG  is  shown  below. 

Zj  = sameCsp 
z2  = same(s2) 
z3  = same (s^) 
z4  = same(s4) 
z5  = same(s5) 

Initialization  information,  such  as  Sj  = G,  s2  = B,  s3  = D,  s4  = G,  and 
s5  = B,  must  also  be  included  as  part  of  the  concept. 

The  Hunt  and  Poltrock  model  does  not  recognize  multiple  next  or  predecessor 
relations;  nor  does  it  permit  the  description  of  relations  between  letters  with 
non-corresponding  positions  in  adjacent  periods.  Thus  the  concept  of  series 
(10)  cannot  be  described.  However,  this  is  more  a deficiency  of  the  model  than 
of  the  representational  technique  used  to  describe  series  concepts. 
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The  Simon  and  Kotovsky  (SGK)  model  represents  series  concepts  pri- 
marily on  the  basis  of  intra-period  relations.  This  requires  a mechanism 
for  stepping  a pointer  forward  through  an  arbitrary  alphabet  (the  successor 
operation),  a mechanism  for  resetting  the  pointer  to  any  arbitrary  location 
m the  alphabet,  and  a mechanism  for  constructing  arbitrary  circular 
alphabets.  The  standard  forward  and  backward  circular  alphabets  are 
initially  available.  The  concept  of  the  series  AAABBBCCC  would  be: 

r 1 = [alphabet);  A 

, , (If) 

ml,ml,ml,n(ml)  , 

where  ml  is  the  forward  alphabet  with  pointer  initialized  to  A.  Th«  n(ml) 
represents  the  act  of  stepping  the  pointer  to  the  next  position  in  the 
alphabet  and  does  not  represent  the  generation  of  a letter  cf  the  series, 
as  do  the  nil's.  The  concept  of  series  (10)  would  be: 

ml  = [alphabet] ;A 

m2  = [backward  alphabet) ;Z  (12) 

ml,n(ml) ,n(ml) ,m2,n(m2) ,n(m2)  . 

Here  two  separate  alphabets,  ml  and  m2,  are  required.  Simple  repetition  can 
be  handled  by  creating  an  arbitrary  alphabet  from  the  letters  comprising 
one  period  of  the  series,  i.e..  the  concept  of  GBDGBGBDGBG  would  be: 

ml  = [GBDGB)  ;G 
ml,n(ml) 

A comparison  of  the  PS,  SSK,  and  Kt,lV  representations  is  given  in 
Table  1,  using  series  taken  from  Simon  and  Kotovsky  (1963).  Note  that  in 
the  S5K  notation  inter-period  relations  are  implicit  rather  than  explicit, 
while  in  the  K$W  notation  intra-period  relations  cannot  be  described  at 
all.  However  in  production  system  notation  both  can  be  explicitly  described, 
as  illustrated  by  the  first  two  columns  of  the  Table. 
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Table  1.  Comparison  of  Froduction  System,  Simon  § Kotovsky,  and  Klahr  5 Wallace 
notations  for  representing  letter  series  concepts.  (Extrapolation 

rules  are  shown  above  the  dotted  lines,  initialization  rules  below  the 
dotted  lines.) 
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One  advantage  of  using  a PS  representation  is  that  it  permits  initiali- 
zation rules  to  be  represented  in  a form  identical  to  extrapolation  rules. 
Furthermore,  there  is  a certain  degree  of  independence  between  initializa- 
tion and  extrapolation  which  makes  it  possible  to  extrapolate  a given 
series  without  using  initialization  information.  With  the  Kf,W  representa- 
tion this  is  also  possible,  but  the  system  must  effectively  regenerate  the 
series  from  scratch  in  order  to  extrapolate  it  To  extrapolate  a series 
using  S§k  extrapolation  rules  the  system  must  obtain  the  initialization 
information  from  the  series  (a  non-trivial  iask)  and  then  use  it  to  regen- 
erate the  series  from  scratch. 

Representation  of  Hierarchical  Sequences 

Sequential  behavior  can  be  analyzed  in  terms  of  hierarchical  systems 
(Chomsky,  1963;  Restle,  1970)  and  we  will  now  compare  one  such  analysis 
with  a corresponding  production  system  analysis.  Restle  (1970)  developed  a 
notation  for  describing  a hierarchical  sequence  as  a series  of  nested 
operators:  T\,  R and  M,  which  can  transpose  (add  or  subtract  by  i) , repeat, 

or  mirror  (reflect)  sequences  given  as  arguments.  For  example,  T+1(3)  is 
(3  4)  and  T+3(3  4)  is  (3467).  Similarly,  R(3)  is  (3  3)  and  R(1  2)  is 
(1  2 1 2)*.  Thus  the  pattern  31  316464  can  be  represented  as 
3 CRCT_ 2 C3) ) ) . This  is  equivalent  to  representing  the  pattern  as  a regular 
binary  tree. 

The  hierarchical  pattern  31  3 1 b 4 6 4 can  also  be  represented  by 
the  following  production  system: 

13. 1 xl  x2  x3  x4  -+  >1"  ' 

13. 2 xl  x2  -+  xl  (13) 

13.3  xl  ->  "xl 


*For  a description  of  the  "mirror"  operation  see  Restle  (1970). 


19. 


Hie  pattern  is  generated  from  the  initial  element  3 by  one  application  of 
rule  13.3  (to  produce  3 1),  two  applications  of  13.2  (to  produce  3131) 
and  four  applications  of  13.1  (to  produce  31316464).  Note  that 
each  production  rule  is  analogous  to  one  of  the  kestle  operators  (or  one 
level  in  the  corresponding  binary  tree).  Hierarchical  sequences  based  on 
transposition  and  repetition  can  be  described  in  terms  of  this  PS  notation 
since  these  operations  map  directly  into  the  predecessor,  successor,  and 
same  relations  used  by  the  PS's  in  this  paper. 

Greeno  and  Simon  (1974)  have  analyzed  the  problem  of  converting  sequsnee 
information  stored  as  a hierarchy  of  operators  into  serially  ordered  per- 
formance. The  analysis  was  made  on  information  represented  in  Restle's 
notation  and  considered  questions  about  the  requirements  made  by  the  inter- 
pretive process  on  memory  storage  and  computational  complexity.  Three 
interpretive  processes  (push  Jown,  recompute,  and  doubling)  were  presented 
for  producing  the  sequence  56234512  from  (T  _ (Tj (5) ) ) , and  each 
was  analyzed  in  terms  of  storage  and  computational  requirements.  One  of 
these  processes,  called  doubling,  involves  the  application  of  identical 
operators  several  times  in  succession,  as  illustrated  in  Figure  3.  This 
particular  interpretive  process  is  identical  to  the  one  used  to  interpret 
a production  system  recantation  of  this  pattern.  For  example,  the  above 
sequence  can  be  represented  in  PS  form  as: 

14.1  xl  x2  x3  x4  -*■  ' xl 

14-2  xl  x2  - " 'xl  (14) 

i4.3  xl  ->  xi* 

If  WC  map  rulG  14>1  int0  the  operator  T_Jf  14.2  into  T ,,  and  14.3  into  T 
we  see  that  the  sequence  of  rule  applications  which  generates  the  series  is 
the  same  sequence  given  in  Figure  3.  Greeno  and  Simon  found  that  the 


Figure  3.  Doubling  interpretive  process  for 
producing  a sequence  from 
T-l(T_3(Tj (5))) . (Taken  from 
Greeno  and  Simon,  1974) . 
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doubling  interpretation  process,  when  compared  with  the  other  two, 
minimized  the  number  of  operator  applications  and  aerator  retrievals  from 
memory,  while  maximizing  the  amount  of  short-term  memory  requ  red.*  Thus 
we  conclude  that  a production  system  representation  of  serial  patterns 
implies  a process  for  which  computational  complexity  has  been  reduced  at 
the  expense  of  memory  requirements. 

V.  EXTENSION  OF  LEARNING  TECHNIQUE 

The  primary  interest  here  is  in  developing  a simple  uniform  technique 
for  generating  the  concept  of  a complex  letter  series.  The  creation  of  com- 
pact or  minimal  sets  of  rules  is  considered  to  be  of  secondary  importance. 

An  extension  of  the  previously  discussed  learning  technique  will  now  be 
presented.  Since  series  based  on  alphabets  are  used,  rules  must  now  be 
generalized  before  being  added  to  the  system.  For  example,  the  series  ABCD 
cannot  be  extrapolated  from  the  rules  A -*•  B,  B -*•  C,  and  C -*■  D.  A generalized 
version  of  these  rules,  namely  xl  -*  xl',  provides  the  needed  predictive 
power.  But  the  need  to  generalize  rules  leads  to  another  problem:  that  of 
determining  which  relations  between  letters  should  be  made  explicit  in  the 
generalization.  This  is  a non-trivial  problem  because  the  system  will 
either  make  errors  or  become  bogged  down  in  backtracking  if  spurious  rela- 
tions are  made  explicit  (Waterman,  1974).  This  problem  is  solved  by 
hypothesizing  a period  size  and  then  making  explicit  only  relations  between 
letters  which  occupy  the  same  relative  position  within  adjacent  periods. 

This  method  for  limiting  the  search  for  relevant  relations  is  called  the 
template  strategy.  Since  this  technique  deals  only  with  inter-period  rela- 
tions it  creates  production  systems  similar  to  those  shown  in  column  1 
of  Table  1. 

*For  m operators  the  number  of  operator  applications  was  2m  - 1,  operator 
retiievals  was  m,  and  maximum  memory  capacity  2m_l. 
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Example  of  Production  System  Series  Extrapolation 

The  new  production  system  learning  technique  is  identical  to  the 
earlier  one  with  the  following  exceptions: 

!•  Only  one  cycle  through  the  series  is  necessary,  regardless 
of  errors. 

2.  New  rules  are  added  immediately  above  the  error-causing  rule, 
rather  than  above  all  the  current  rules. 

3*  A generalized  version  of  each  rule  is  added  to  the  system, 
rather  than  a specific  one,  and  only  inter-period  relations 
are  made  explicit. 

4.  Period  size  is  hypothesized,  in  order  from  1 to  n,  where  n 
is  the  length  of  the  given  series.  For  each  period  size 
hypothesis  one  learning  cycle  is  attempted.  The  cycle  is 
aborted  and  the  period  size  hypothesis  incremented  whenever 
(a)  no  relation  can  be  found  between  letters  occupying  the 
same  relative  position  in  adjacent  periods,  or  (b)  the 
number  of  rules  added  exceeds  the  period  size  hypothesis. 

An  example  using  the  series  ABMCDMEF  will  illustrate  this  procedure.  The 

initial  period  size  hypothesis  is  1,  and  no  rules  are  present.  The  context 

A is  presented  to  the  system;  since  there  are  no  rules  an  error  results 

and  the  rule  A + B is  generalized  and  added  to  the  system.  Since  the  period 

is  assumed  to  b 1 the  relation  between  A and  B is  made  explicit  and  A -*•  B 

is  added  as  xl  ■>  xl*.  Next  the  context  AB  is  presented  which  matches  the 

rule  just  added  and  C is  predicted,  rather  than  the  correct  letter  M.  Thus 

a new  rule  must  be  added.  However  this  would  make  the  number  of  rules  (2) 

larger  than  the  period  size  (1),  so  the  cycle  is  aborted  and  starts  over 

with  a period  size  hypothesis  of  2 and  no  rules  present. 

Now  the  context  A is  presented;  it  leads  to  an  error  since  no  rules 
are  present,  and  the  rule  A -*■  B is  generalized  and  added  as  xl  -*•  B,  since 
A and  B are  now  both  in  the  same  period.  Next  AB  is  presented  which  matches 
the  rule  just  added  and  B is  predicted  rather  than  the  correct  letter  M. 

Thus  the  rule  A B -►  M is  generalized  and  added,  except  that  here  the 
generalization  fails  since  no  relation  can  be  found  between  A and  M 
(nothing  higher  than  triple  predecessor  and  successors  are  considered). 
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As  before  the  cycle  is  aborted  and  starts  over  with  no  rules  present  and 
a period  size  hypothesis  of  3. 

Again  the  context  A leads  to  an  error  and  xl  -*•  B is  added  to  the 
system.  Fhen  AB  is  presented,  leading  to  the  erroneous  prediction  of  B. 

Thus  A B + M is  generalized  and  added  as  xl  x2  - m,  since  A,  B,  and  M are 
all  in  the  same  period.  The  set  of  rules  is  now: 

15.1  xl  x2  M (initialization) 

(15) 

15.2  xl  ->  B (initialization) 

Here  "initializat  on"  indicates  that  these  are  intra-period  rules  needed 
for  initialization  of  the  series  but  not  for  extrapolation.  To  accelerate 
learning  this  type  of  rule  is  always  considered  to  lead  to  an  erroneous 
prediction.  Next  the  context  ABM  is  presented,  matching  15.1  which  predicts 
M rather  than  the  correct  letter,  C.  So  the  rule  A B M ■»  C is  generalized 
and  added  above  15.1  to  produce: 

16.  1 xl  x2  x3  ->  xl ' ' (1) 

16.2  xl  x2  -►  M (initialization)  (16) 

1^*3  xl  -*■  B (initialization) 

where  the  (1)  indicates  that  this  is  the  first  rule  added  that  counts  relative 
to  the  abort  decision  based  on  the  number  of  rules  added  (only  inter-period 
rules  are  counted) . Next  the  context  ABMC  is  presented  which  matches  16. 1 
and  correctly  predicts  D as  the  next  letter.  Now  the  context  ABMCD  is 
presented,  again  matching  16.1  but  incorrectly  predicting  0.  Thus  the  rule 
B M C D ■*  M is  generalized  and  added  to  produce: 


(17) 


17.  1 

xl  M x5  xl' ' 

-v  M 

(2) 

17.2 

xl  x2  x3 

H-  Xl" 

(1) 

17.3 

xl  x2 

+ M 

(initiali zation) 

(ini tialization) 


17.4 


xl  B 
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Finally,  when  ABMCDM  and  ABMCDME  are  presented  they  elicit  correct  predic- 
tions and  the  learning  phase  is  complete.  Now  the  entire  series  ABMCDME F 
is  presented  and  the  correct  extrapolation,  letter  M,  is  made.  The  concept 
of  the  series  is  considered  co  be  the  extrapolation  rules  (17.1  and  17.2) 
plus  the  initialization  rules  (17.3  and  17.4)  shown  in  (17).* 

Rule  generalization  is  straightforward  and  requires  the  rule,  the 
series,  and  the  hypothesized  period  size.  For  example,  if  the  rule  is 
B B C + B and  the  series  is  ABBCBAE  with  period  size  3,  then,  as  shown 


below,  arrows  can  be  drawn  between  letters  whose  relations  are  to  be  made 


explicit. 


abb/cba/e 


Now  the  rule  B B C + li  has  only  one  such  arrow,  thus  only  the  relation 
between  the  first  and  last  B is  made  explicit.  Since  it  is  a same  relation 
it  can  be  made  explicit  by  using  total  generalization  to  get  xl  x2  x3  -*■  xl 


or  partial  generalization  to  get  B x2  x3  ■+  B.  Partial  generalization  can 
only  be  used  for  two  letters  connected  by  the  same  relation,  and  never  for 
letters  connected  by  predecessor  or  successor  relations. 


The  learning  technique  jus*  described  works  when  only  total  generaliza- 
tion on  same  is  permitted  and  also  when  only  partial  generalization  on  same 
is  permitted.  But  in  the  former  case  the  concept  learned  for  series  con- 
taining simple  repetition  is  mucli  more  compact.  Thus  in  the  computer 
implementation  of  this  learning  technique,  total  generalization  on  same 
occurs  on  the  first  inter-period  rule  added  to  the  system  during  each  cycle, 


The  default  rule  * -*•  A is  also  needed  to  generate  the  series  from  scratch. 
In  the  current  computer  implementation  of  the  extended  learning  technique 
all  initialization  rules,  except  the  default  one,  are  learned  during 
the  normal  execution  of  the  technique.  Thus  to  generate  complete  series 
the  system  must  be  given  either  the  first  letter,  the  default  rule,  or  a 
trivial  program  modification  which  causes  automatic  generation  of  the 
default  rule. 
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and  partial  generalization  occurs  on  all  the  subseqjent  rules  added. 
Since  total  generalization  on  same  always  leads  to  a single  rule  repre- 
sentation of  simple  repetition  series,  this  procedure  is  equivalent  to 
the  heuristic:  "check  to  see  if  you  have  a simple  repetition  series 

before  proceeding  with  the  more  complex  series  extrapolation  methods". 


Comparison  with  Other  Series  Extrapolation  Techniques 

The  production  system  learning  technique  just  illustrated  is  a method 
for  learning  series  concepts  based  solely  on  inter-period  relations.  In 
this  respect  it  is  similar  to  the  KfjW  template  matching  technique  for 
series  extrapolation.  There  are  some  differences,  however,  First,  for 
template  matching  the  series  must  always  exhibit  two  complete  periods  or 
the  template  will  not  be  complete,  and  no  predictions  can  be  made.  In  the 
PS  technique  two  periods  are  not  always  required  since  the  method  auto- 
matically hypothesizes  that  the  inter-period  relations  not  yet  specified 
are  similar  to  those  already  learned.  For  example,  the  template  technique 
fails  on  the  series  AABBACB,  even  though  this  can  be  extrapolated 
AABBACBDAEBFA.  It  can  be  thought  of  as  the  series  ABABABA  interleaved 
with  ABCDEF . The  PS  technique*  applied  to  AABBACB  produces  the  series 
concept : 


xl  x2  x3  x4  xl  x2' ' 


(18) 


xl  x2  x3  x4  -*•  xl  , 

from  which  the  correct  extrapolation  can  be  made.  A second  difference 
between  the  template  technique  and  the  PS  technique  is  that  the  former 
always  finds  a concept  based  on  the  shortest  period  whereas  the  latter 


*For  this  example  only,  the  technique  consists  of  using  only  total  generali- 
zation on  same. 


I 


1 


1 
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may  find  a concept  based  on  some  multiple  of  the  shortest  period.  Even 
when  this  occurs  the  predictions  made  by  the  PS  technique  are  identical 
to  those  made  by  the  template  technique. 

A production  system  learning  technique  based  on  intra-period  relations 
has  not  yet  been  developed,  but  might  prove  to  be  a promising  area  for 
continued  research.  One  of  the  major  problems  with  this  approach  is  the 
difficulty  during  the  learning  phase  of  distinguishing  between  relevant 
and  spurious  intra-period  relations.  Both  the  Simon-Kotovsky  and  Hunt-Poltrock 
models  dealt  with  this  problem  with  a certain  degree  of  success,  thus  the  heur- 
istics they  used  should  provide  useful  guidelines  for  an  adaptive  production 
system  implementation. 


Computer  Implementation  of  Production  System  Learning  Techniques 

Both  the  basic  learning  technique  (applied  to  simple  repetition  series) 
and  the  extended  learning  technique  (applied  to  series  using  circular  alpha- 
bets) have  been  realized  as  computer  programs  written  in  the  PAS-II  system 
(Waterman  and  Newell,  1973).  Each  program  is  a short  production  system 
which  can  modify  itself  by  adding  new  production  rules.  The  rules  added  by 
the  system  represent  the  concept  of  the  series  being  learned.  A complete 

description  of  these  self-modifying  production  systems  is  given  elsewhere 
(Waterman,  1974). 

Examples  of  series  concepts  learned  by  SCI,  the  program  employing  the 
basic  learning  technique,  are  shown  in  Figure  4.  Both  Figure  5*  and 
Table  1 (first  column)  contain  series  concepts  learned  by  SC 2,  the  program 
employing  the  extended  learning  technique.  Here  redundant  rules  have  been 


*The  series  in  Figure  5 were  taken  from  Williams  (1972). 
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Series 
1 . AABAABA 


2 . ABACABA 


Concept 

Predictions 

B -+  A 
A A + B 

AAB 

A -*•  A 

C -*•  A 

B A -*•  C 

CAB 

B A 
A -*•  B 

GBDGBGBDG 


BBABCBBBBA 


D G B + G 

G B - D 

D - G 

G ->  B 

B B 8 B + A 

B B B + 8 

C B B + B 

C + B 

A B + C 

A ->•  B 

B -+  B 


BGB 


BCB 


Figure  4. 


series  Concepts  Learned  by  the 
SCI  Series  Extrapolation  Program. 
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Series 

Concept 

Prediction 

1. 

CDCDCD 

xl  x2  -*■  xl 

CDC 

2. 

AAABBB 

xl  x2  x3  -*•  xl' 

CCC 

3. 

ATBATAATB 

xl  x2  x3  x4  x5  x6  -»  xl 

ATA 

4. 

RSRTRURV 

R x2  R -<•  x2' 
xl  x2  -«•  xl 

RKR 

5. 

ABMCDMEF 

xl  M \3  xl  " -*  M 
xl  x2  x3  -*•  xl" 

MGH 

6. 

DEFGEFGII 

xl  x2  x3  x4  -*■  xl' 

FGII 

7. 

QXAPXBQXA 

xl  x2  x5  x4  x5  x6  -*•  xl 

PXB 

8. 

ABCDABCEA 

C x2  x3  x4  C -►  x2' 
xl  x2  x3  x4  ->  xl 

BCF 

9. 

MABMBCMCDM 

M x2  x3  x4  x5  x6  M -*•  x2 ' ' 
xl  x2  M x4  x5  x6  xl"  x2'  ’ - M 
xl  x2  x3  x4  x5  x6  xl"  -►  x2" 
xl  x2  x3  x4  x5  x6  •*  xl 

DEM 

10. 

URTUSTU 

U x2  x3  U ->  x2’ 
xl  x2  x5  -*•  xl 

TTU 

11. 

MNLNKNJ 

xl  N ’ xl  -*■  N 
xl  x2  -*■  'xl 

NIN 

12. 

ABYABXAB 

B x2  x3  B ' x2 

xl  x2  x3  ->  xl 

WAB 

13. 

RSCDSTUE 

xl  x2  x3  x4  xl' 

TUE 

14. 

NPAOQAPR 

xl  A x3  xl ' -*■  A 
xl  x2  x3  ■+  xl ' 

AQS 

15. 

MNOMOOMPO 

M x2  x3  M -*■  x2' 
xl  x2  x3  -*•  xl 

MQO 

16. 

WXAXYBY 

xl  x2  x3  ■*  xl' 

zcz 

17. 

JKQRKLRS 

xl  x2  x3  x4  -*•  xl' 

LMS 

18. 

PONONMNM 

xl  x2  x3  'xl 

LML 

19. 

CEGEDEIIEEE 

xl  E x3  x4  xl'  -*•  E 
xl  x2  x3  x4  -*•  xl' 

IEF 

Figure  5.  Series  Concepts  learned  by  the  SC2 
Series  Extrapolation  Program. 

i. 
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eliminated  from  the  concept  descriptions.  Figure  6 contains  more  diffi- 
cult series  concepts  learned  by  SC2.  The  SfiK  program,  or  any  program 
based  on  intra-period  relations,  would  tend  to  have  difficulty  with 
these  series.  Note  that  series  4 in  Figure  6 is  another  that  the  K5W 
template  matching  procedure  would  be  unable  to  solve. 

Even  though  the  SC2  program  is  an  extension  of  SCI  they  do  not  always 
make  the  same  predictions,  particularly  when  given  ambiguous  series.  For 
example,  given  the  series  ABBA,  SCI  makes  the  simple  extrapolation: 

BBA.  ..,  or  simple  repetition  of  period  3.  However,  SC 2 (and  the  KqW  pro- 
gram) would  find  a more  complex  extrapolation  of  period  2,  i.e.,  CZD... 

For  unambiguous  simple  repetition  series,  SCI  and  SC2  always  make  the  same 
predictions,  but  SC2  produces  a simpler  concept  after  a much  greater  compu- 
tational effort. 

VI.  CONCLUSION 

A learning  technique  has  been  presented  for  finding  regularities  in 
sequential  patterns.  It  consists  of  nothing  more  complex  than  forming  an 
ordered  set  of  hypotheses  about  which  pattern  contexts  lead  to  which  new 
pattern  elements.  The  learning  system  starts  witn  very  general  hypotheses, 
i.e.,  rules  which  apply  to  particular  classes  of  patterns.  As  these  rules 
are  proven  erroneous  their  generality  is  reduced  by  adding  new  rules  above 
them  which  apply  to  subclasses  of  these  patterns.  More  specifically, 
learning  proceeds  by  first  assuming  that  only  one  element  in  a particular 
pattern  context  is  relevant  and  then,  as  this  is  proven  false,  falling 
back  to  the  less  general  assumption  that  other  elements  in  that  pattern 
context  are  also  relevant.  When  the  learning  phase  is  complete,  the  system 
has  learned  which  pattern  elements  are  relevant  given  any  particular 


pattern  context. 


30. 


Series 

1 . ABCCDEFFG 

2 . DDCCDDBDEAD 

3.  BADBADCE 

4.  AAABBBCACBD 

5.  ABCBCDCDEF 

6.  ADUACUAEUABUAF 


Concept 

xl  x2  x3  x4  ->  xl'  " 

xl  D x3  ' xl  -*•  I) 

D x2  x3  D -*  x2 ' 
xl  x2  x3  -*■  'xl 

xl  x2  x3  x4  1 xl  -*  x2 ' ' ' 
xl  x2  x3  x4  -*•  'xl 

xl  A x3  x4  a3  x6  xl''  -*•  A 
xl  B x3  x4  x5  x6  xl''  ->  B 
xl  x2  x3  x4  x5  x6  -*•  xl ' ' 

C x2  x3  x4  C x2' ' -*  x3'  ' 
xl  C x3  x4  xl"  C 
xl  x2  x5  xl  ■*  xl'  ' 

xl  U A x4  x5  x6  'xl  U A -*•  x4' 

U A x3  x4  x5  x6  U A -»  'x3 
A x2  x3  x4  x5  x6  A -*•  x2' 
xl  U x3  x4  x5  x6  'xl  -*■  U 
xl  x2  x3  x4  x5  x6  ->  xl 


Predictions 
HI  I 
FZD 


2GB 

BEA 


CFG 


UAA 


Figure  6.  Difficult  Series  Concepts  Learned 

by  the  SC2  Series  Exatrapolation  Program 
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The  preceeding  remarks  apply  to  both  the  basic  learning  technique 
and  to  the  extension  of  that  technique.  However  in  the  extension  of  the 
basic  learning  technique  an  additional  generalization  process  is  present. 

This  is  the  process  of  characterizing  the  relations  between  elements  of 
the  pattern  in  a very  general  way  before  adding  the  rule  containing  these 
pattern  elements  to  the  system.  Here  a specific  rule  is  made  more  general 
subject  to  constraints  imposed  by  the  current  strategy  for  recognizing 
relations.  Only  one  such  strategy  (the  template  strategy)  was  presented 
in  this  paper.  As  mentioned  earlier  it  involved  hypothesizing  a period 
size  and  only  recognizing  relations  between  corresponding  elements  of 
adjacent  periods.  However  by  making  the  strategy  for  recognizing  relations 
a little  more  sophisticated  it  should  be  possible  to  create  a single  unified 
program  which  can  learn  series  concepts  based  on  both  inter-period  and 
intra-period  relations. 

The  production  system  learning  technique  was  presented  primarily  as 
an  artificial  intelligence  implementation  of  sequence  extrapolation,  rather 
than  as  a model  of  human  problem  solving.  In  fact,  comparison  with  Restle's 
work  indicates  that  the  learning  technique  may  more  closely  model  human 
sequence  prediction  than  sequence  extrapolation,  even  though  the  two  are 
very  closely  related.  Since  both  extrapolation  and  prediction  require 
pattern  acquisition,  it  might  be  useful  to  examine  the  implications  of 
this  learning  technique  for  a general  theory  of  human  serial  pattern  acqui- 
sition. First,  it  implies  that  some  portion  of  human  long  term  memory  is 
organized  in  the  form  of  a production  system  or  set  of  condition-action 
rules.  Second,  it  implies  that  these  rules  have  an  order  imposed  on  them, 
i.e.,  given  any  rule,  one  can  find  the  next  rule  in  the  list.  However, 


m 


I 
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this  ordering  does  not  imply  that  conditions  on  rules  are  accessed  serially. 
■Hie  matching  of  production  rule  conditions  against  data  in  short  term 
memory  is  considered  to  proceed  in  parallel*,  leading  to  a set  of  rules 
whose  conditions  match  the  data.  A single  rule  is  chosen  from  this  con- 
flict set  on  the  basis  of  relative  location  in  long  term  memory,  and  its 
actions  are  executed.  Thus  response  latency  is  not  necessarily  propor- 
tional to  production  system  size.  Third,  the  learning  technique  implies  a 
memory  for  the  locations  of  rules  recently  fired.  This  follows  from  the 
necessity  of  incorporating  new  rules  into  the  system  in  front  of  error- 
causing  rules.  Finally,  it  implies  that  learning  serial  patterns  involves 
a liberal  use  of  memory  capacity  in  order  to  reduce  computational  complexity. 

It  is  felt  that  this  learning  technique  can  be  generalized  to  other 

induction-type  tasks.  A similar,  though  much  simpler,  technique  has  already 

been  used  in  a production  rule  simulation  of  verbal  learning  (Waterman, 

1974).  Another  similar,  but  more  complex,  technique  has  been  used  in  a 

production  system  program  which  learns  heuristics  for  draw  poker  (Waterman, 
1970). 


This  is  currently  implemented  as  a simple  serial  process 
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